AITopics | model fusion

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Education (0.95)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsDec-24-2025, 22:12:37 GMT

Model Fusion via Optimal Transport

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield one-shot knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d.

model fusion, name change, optimal transport, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)

Shih, Yi-Jen, Harwath, David

Unifying Model and Layer Fusion for Speech Foundation Models

arXiv.org Artificial IntelligenceNov-12-2025

Abstract--Speech Foundation Models have gained significant attention recently. Prior works have shown that the fusion of representations from multiple layers of the same model or the fusion of multiple models can improve performance on downstream tasks. We unify these two fusion strategies by proposing an interface module that enables fusion across multiple upstream speech models while integrating information across their layers. We conduct extensive experiments on different self-supervised and supervised models across various speech tasks, including ASR and paralinguistic analysis, and demonstrate that our method outperforms prior fusion approaches. We further analyze its scalability concerning model size and count, highlighting the importance of selecting appropriate upstream models. Our results show that the proposed interface provides an additional performance boost when given a suitable upstream model selection, making it a promising approach for utilizing Speech Foundation Models. Personal use of this material is permitted.

artificial intelligence, fusion, machine learning, (16 more...)

2511.08389

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

arXiv.org Artificial IntelligenceOct-23-2025

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

Gu, Yanggan, Wang, Yuanyi, Yan, Zhaoyi, Zhang, Yiming, Zhou, Qi, Wu, Fei, Yang, Hongxia

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining the probability information. By introducing probability clipping and max-margin fusion strategies, InfiFPO enables the pivot model to align with human preferences while effectively distilling knowledge from source models. Comprehensive experiments on 11 widely-used benchmarks demonstrate that InfiFPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, InfiFPO improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.

large language model, machine learning, natural language, (20 more...)

2505.13878

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Alsheikh, Ahmad, Fischer, Andreas

Fusion-Based Neural Generalization for Predicting Temperature Fields in Industrial PET Preform Heating

arXiv.org Artificial IntelligenceOct-8-2025

Accurate and efficient temperature prediction is critical for optimizing the preheating process of PET preforms in industrial microwave systems prior to blow molding. We propose a novel deep learning framework for generalized temperature prediction. Unlike traditional models that require extensive retraining for each material or design variation, our method introduces a data-efficient neural architecture that leverages transfer learning and model fusion to generalize across unseen scenarios. By pretraining specialized neural regressor on distinct conditions such as recycled PET heat capacities or varying preform geometries and integrating their representations into a unified global model, we create a system capable of learning shared thermal dynamics across heterogeneous inputs. The architecture incorporates skip connections to enhance stability and prediction accuracy. Our approach reduces the need for large simulation datasets while achieving superior performance compared to models trained from scratch. Experimental validation on two case studies material variability and geometric diversity demonstrates significant improvements in generalization, establishing a scalable ML-based solution for intelligent thermal control in manufacturing environments. Moreover, the approach highlights how data-efficient generalization strategies can extend to other industrial applications involving complex physical modeling with limited data.

artificial intelligence, deep learning, machine learning, (18 more...)

2510.05394

Country:

Europe > Germany (0.04)
Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 06:58:20 GMT

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, communication round, machine learning, (13 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Education (0.95)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-2-2025, 06:58:12 GMT

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
(3 more...)

Industry:

Education (0.96)
Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Yang, Zichuan, Wang, Yongzhi

EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion

arXiv.org Artificial IntelligenceAug-27-2025

Medical image classification is critical for clinical decision-making, yet demands for accuracy, interpretability, and generalizability remain challenging. This paper introduces EVM-Fusion, an Explainable Vision Mamba architecture featuring a novel Neural Algorithmic Fusion (NAF) mechanism for multi-organ medical image classification. EVM-Fusion leverages a multipath design, where DenseNet and U-Net based pathways, enhanced by Vision Mamba (Vim) modules, operate in parallel with a traditional feature pathway. These diverse features are dynamically integrated via a two-stage fusion process: cross-modal attention followed by the iterative NAF block, which learns an adaptive fusion algorithm. Intrinsic explainability is embedded through path-specific spatial attention, Vim Δ-value maps, traditional feature SE-attention, and cross-modal attention weights. Experiments on a diverse 9-class multi-organ medical image dataset demonstrate EVM-Fusion's strong classification performance, achieving 99.75% test accuracy and provide multi-faceted insights into its decision-making process, highlighting its potential for trustworthy AI in medical diagnostics.

artificial intelligence, evm-fusion, machine learning, (14 more...)

2505.17367

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(18 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pereira, Luiz, Amini, M. Hadi

Heterogeneous Federated Reinforcement Learning Using Wasserstein Barycenters

arXiv.org Artificial IntelligenceJun-23-2025

In this paper, we first propose a novel algorithm for model fusion that leverages Wasserstein barycenters in training a global Deep Neural Network (DNN) in a distributed architecture. To this end, we divide the dataset into equal parts that are fed to "agents" who have identical deep neural networks and train only over the dataset fed to them (known as the local dataset). After some training iterations, we perform an aggregation step where we combine the weight parameters of all neural networks using Wasserstein barycenters. These steps form the proposed algorithm referred to as FedWB. Moreover, we leverage the processes created in the first part of the paper to develop an algorithm to tackle Heterogeneous Federated Reinforcement Learning (HFRL). Our test experiment is the CartPole toy problem, where we vary the lengths of the poles to create heterogeneous environments. We train a deep Q-Network (DQN) in each environment to learn to control each cart, while occasionally performing a global aggregation step to generalize the local models; the end outcome is a global DQN that functions across all environments.

artificial intelligence, deep learning, machine learning, (17 more...)

2506.15825

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (1.00)
Information Technology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-26-2025, 21:02:11 GMT

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization.

artificial intelligence, machine learning, natural language, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.66)
Information Technology > Artificial Intelligence > Machine Learning (0.45)